Education
Question: What is the percentage of people who have a Bachelor's Degree
Or is there another more efficient method?
# Assume the CSV file is already converted into a dataframe
# Keep only the values equal to "bachelor"
column_values = df['education']
# Boolean mask
# It compares each element of `column_values` with the value `"Bachelors"`
boolean_mask = column_values == "Bachelors"
# Using the boolean mask for indexing
# When you use a boolean mask to index a Series or DataFrame, only the elements corresponding to `True` values in the mask are selected.
filtered_series = column_values[boolean_mask]
# Finally count the duplicates
filtered_series.value_counts()
Aight, this was wrong. We were meant to get the percentage,
So, is there a straightforward way of pandas that can calculate the percentage of something?
Okay there's no straightforward way
Percentage = (Part / Whole) * 100
Wait, what does value_counts()
do?
Okay, it counts the unique values
So, the best way for this is to find a pandas function that can count every elements in a column, that'd be the whole
.
The filtered_series.value_counts()
then would be the part
# Keep only the values equal to "bachelor"
column_values = df['education']
whole = column_values.count()
# Boolean mask
# It compares each element of `column_values` with the value `"Bachelors"`
boolean_mask = column_values == "Bachelors"
# Using the boolean mask for indexing
# When you use a boolean mask to index a Series or DataFrame, only the elements corresponding to `True` values in the mask are selected.
filtered_series = column_values[boolean_mask]
# Finally count the duplicates
part = filtered_series.value_counts()
percentage = ( part / whole ) * 100